All the material presented here, to the extent it is original, is available under CC-BY-SA.
09:00-09:45 Introduction and set-up
10:00-11:30 Visualization and mapping with coordinate reference systems
11:30-12:00 Discussion
13:00-14:00 Visualization and mapping with coordinate reference systems (exercises)
14:00-14:15 #30DayMapChallenge https://30daymapchallenge.com/ https://fosstodon.org/web/tags/30DayMapChallenge https://twitter.com/search?q=%252330DayMapChallenge
14:30-15:30 Finding systematic patterns in variable/geographic space
09:00-09:45 Jakub Nowosad (zoom) Q&A Finding systematic patterns in geographic space
10:00-11:00 Finding systematic patterns in variable/geographic space (exercises)
11:00-12:00 External libraries, links to GRASS GIS, raster/terra
13:00-13:30 Geodemographics
13:30-14:30 Links to GIS, raster/terra, geodemographics (exercises)
14:45-15:45 Spatio-temporal cubes, juridicial change
09:00-10:00 Spatio-temporal cubes, juridicial change (exercises)
10:15-11:15 Global and local spatial autocorrelation
11:15-12:00 Data puzzles (support)
13:00-14:15 Global and local spatial autocorrelation (exercises)
14:30-15:30 Residual autocorrelation (exercises)
09:00-10:15 Spatial autocorrelation and machine learning
10:30-11:30 Point pattern analysis
11:30-12:00 Point pattern analysis (exercises)
13:00-14:30 Transport, networks, transmission
14:45-15:45 Transport, networks, transmission (exercises)
Depending on the number of participants wishing to present project outlines (about 25 minutes, 10-15 minutes presentation, 10-15 minutes discussion, the number of slots may vary.
09:00-12:00 Presentation of up to six project outlines, with 20 min. break.
12:30-15:30 Presentation of up to six project outlines, with 20 min. break.
15:30-16:00 Round-up and feedback
Auditorium M is larger than the room initially assigned, but has been upgraded recently. We’ll find out practically how to get into this room (Door B, up stairs, turn left); in principle all teaching rooms are access-card-only, so those of us with will need to help others; perhaps Canvas messages could be used to contact others if you get stuck outside a locked door (the Canvas app can be set up to issue notifications when a message is received). Because anyone with a card can access the room, the room cannot be physically locked, so either we can agree among ourselves on who might stay in the room for each of two halves of the lunch break to avoid rigging down laptops. We’ll have to learn how to live with this, upgraded rooms have both positive and negative advantages.
We can probably stay in one part of the room; as far as I understand, the active camera is that facing the front. Microphones need to be used to be heard when streaming is activated. The streaming times are set at 09:00-12:00 and 13:00-15:45 Monday to Thursday, and 09:00-12:00 and 12:30-16:00 on Friday. Streamed content should be available in the Panopto Video tab in Canvas. It is now only available to those registered on this course and logged in. Raw recordings will be published successively in the same place with little delay for those who are at an awkward time-zone offset. Subsequently, edited recordings cutting out periods with no relevant content will replace the raw recordings. The recordings should show a view of the front of the room, beamer content, and sound captured by microphones.
Those who are not on-site will have to use Canvas messages to interact with us; not ideal but the video system has no two-way option. We are (at the time of writing) so close to on-site delivery, nothing has happened (so far) to force a reversion to hybrid, so delivery will be on-site plus streaming. Should anyone needing to present on Friday be hindered, we can fall back on a zoom session for everybody early the following week if the need arises.
The local weather forecast may be found at https://www.yr.no/en/forecast/daily-table/1-92388/Norway/Vestland/Bergen/Sandviken. Since working in auditorium M may be a bit constricting, we may break out of the schedule if the sun comes out (or even if it doesn’t), and for example walk into town (or part of the way) from mid-afternoon to open up for more informal discussions. For those from outside Bergen, the 7-day public transport ticket is convenient, for example using the Skyss app, see https://www.skyss.no/en for information.
Wifi internet is available throughout the NHH campus. If you already use eduroam, you may have been connected automatically through your home institution. If not, or if your eduroam connection does not work here, please consult https://www.nhh.no/en/about-nhh/it-support/it-support-for-guests/; unfortunately identity confirmation requires a Norwegian phone number, so please assist each other in getting online.
Visualization is one of the key strengths of R. However, visualization involves many choices, and R offers such a wide range of choices that guidance is desirable. There are two major underlying techologies, base graphics and grid graphics, and several toolboxes built on these (trellis graphics and grammar of graphics on grid graphics), in addition to JavaScript widgets for interaction. In addition, much work has been done on the effective use of shapes and colours https://colorbrewer2.org, https://hclwizard.org/ (and https://www.nature.com/articles/nmeth.1618). See also the most recent AltText movement: https://webaim.org/techniques/alttext/ https://support.microsoft.com/en-us/office/everything-you-need-to-know-to-write-effective-alt-text-df98f884-ca3d-456c-807b-1a1fa82f5dc2
We can distinguish between presentation graphics and analytical graphics. Presentation graphics are intended for others and are completed (even if interactive). Analytical graphics may evolve into presentation graphics, but their main purpose is to visualize the data being analysed (see Antony Unwin’s book and (http://www.gradaanwr.net/), and Claus Wilke’s https://clauswilke.com/dataviz/. Many of the researchers who have developed approaches to visualization have been involved with Bell Labs, where S came from https://priceonomics.com/how-william-cleveland-turned-data-visualization/.
As installed, R provides two graphics approaches, one known as base graphics, the other trellis or lattice graphics. Most types of visualization are available for both, but lattice graphics were conceived to handle conditioning, for example to generate matching plots for different categories. Many of these were based on the data-ink ratio, favouring graphics with little or no extraneous detail (Edward Tufte) - see Lukasz Piwek’s blog http://motioninsocial.com/tufte/. There are other approaches, such as Leland Wilkinson’s Grammar of Graphics, implemented in Hadley Wickham’s ggplot2 package, which we will also be using here.
So there are presentation and analytical graphics, and there can be a number of approaches to how best to communicate information in either of those modes. R can create excellent presentation graphics, or provide input for graphics designers to improve for print or other media. What we need now are the most relevant simple analytical graphics for the kinds of data we use.
Histograms and many other kinds of chart require user choices about the number of bins to be used and bin/class intervals. This may also be termed quantization: the division of part of the real line on which we have measured a variable into intervals. This can also apply to combined categories if they are recoded to reduce the number of alternatives to be displayed. Class intervals are much used in thematic cartography, and I’m the author of the classInt package.
Class intervals can be chosen in many ways, and some have been collected for convenience in the classInt package. The first problem is to assign class boundaries to values in a single dimension, for which many classification techniques may be used, including pretty, quantile, natural breaks among others, or even simple fixed values. From there, the intervals can be used to generate colours from a colour palette, using the very nice colorRampPalette() function. Because there are potentially many alternative class memberships even for a given number of classes, choosing a communicative set matters.
We may choose the number of intervals ourselves arbitrarily or after examination of the data, or use provided functions, such as nclass.Sturges(), nclass.scott() or nclass.FD(). In hist(), nclass.Sturges() is used by default. We can also split on sign(), but handling diverging intervals often involves more work.
The default intervals for bins in hist() are pretty(range(x), n = breaks, min.n = 1), where breaks <- nclass.Sturges(x). The function computes a sequence of about n+1 equally spaced ‘round’ values which cover the range of the values in x. The values are chosen like values of coins or banknotes (1, 2, 5, etc.).
If we use the classIntervals() function from classInt, we can pass through arguments to the function called through style=, and note that n will not necessarily be the number of output classes. By default, intervalClosure= is "left", so [-30, -20) means numbers greater than and equal to (>=) -30 and less than (<) -20; [10, 20] is numbers >= 10 and <= 20.
The tmap https://r-tmap.github.io/tmap/ package provides cartographically informed, grammar of graphics (gg) based functionality now, like ggplot2 using grid graphics. John McIntosh tried with ggplot2 https://www.johnmackintosh.net/blog/2017-08-22-simply-mapping/, with quite nice results. I suggested he look at tmap, and things got better https://www.johnmackintosh.net/blog/2017-09-01-easy-maps-with-tmap/, because tmap can switch between interactive and static viewing. tmap also provides direct access to classInt class intervals. Like the sf::plot() method, tmap plotting can use classInt internally and accepts a palette (try looking at tmaptools::palette_explorer() for ColorBrewer palettes, or at the rcartocolor package).
The underlying logic of conditioned graphics is that multiple displays (windows, panes) use the same scales and representational elements for comparison. Using the same scales and representational elements for comparison can be done manually, imposing the same scales, colours and shapes in each plot and laying the plots out in a grid. Trellis graphics automated this in S, and lattice provides similar but enhanced facilities in R with a formula interface. ggplot2 and other packages also provide similar functionalities.
In early R, all the graphics functionality was in base; graphics was split out of base in 1.9.0 (2004-04-12), and grDevices in 2.0.0 (2004-10-04). When R starts now, the graphics and grDevices packages are loaded and attached, ready for use. graphics provides the user-level graphical functions and methods, especially the most used plot() methods that many other packages extend. grDevices provides lower-level interfaces to graphics devices, some of which create files and others display windows. The capabilities() function shows what R itself can offer, including non-graphics capabilities, and we can also check the versions of external software used:
capabilities()
## jpeg png tiff tcltk X11 aqua
## TRUE TRUE TRUE TRUE TRUE FALSE
## http/ftp sockets libxml fifo cledit iconv
## TRUE TRUE FALSE TRUE FALSE TRUE
## NLS Rprof profmem cairo ICU long.double
## TRUE TRUE FALSE TRUE TRUE TRUE
## libcurl
## TRUE
grSoftVersion()
## cairo cairoFT pango
## "1.17.6" "" "1.50.9"
## libpng jpeg libtiff
## "1.6.37" "6.2" "LIBTIFF, Version 4.4.0"
Most of base graphics is vector graphics, but some innovations apply both to base and grid. The gridBase package permits base graphics elements, often created as plot() methods in contributed packages, to be placed in grid graphics displays.
We can check the relative standing of graphics and grid from the CRAN package database, and add lattice and ggplot2 (Suggests are typically in examples):
db <- tools::CRAN_package_db()
types <- c("Depends", "Imports", "Suggests")
pkgs <- c("graphics", "grid", "lattice", "ggplot2")
(tbl <- sapply(types, function(type) sapply(pkgs,
function(pkg) length(db[grep(pkg, db[, type]), 1]))))
## Depends Imports Suggests
## graphics 289 2084 90
## grid 80 800 299
## lattice 117 263 267
## ggplot2 370 2693 1203
class(tbl)
## [1] "matrix" "array"
library(lattice)
barchart(tbl, auto.key=TRUE, horizontal=FALSE)
The grid and lattice entered as Recommended in R 1.5.0 in April 2002, and grid became a base package in 1.8.0 in October 2003. Some changes were made in grid for R 3, but its structure remains very stable. The gridBase and gridGraphics packages provide functions for capturing the state of the current device drawn with base graphics tools. One reason for this is the unsolved problem of testing graphics output for identity, to ensure that the same commands for the same data give the same output; for grid objects this is feasible, but not for base graphics on interactive devices. Over and above the use of grid directly, the general-purpose packages lattice and ggplot2 build on grDevices and grid. In addition, it is worth mentioning the vcd (visualizing categorical data) and vcdExtra packages and a recent book on http://ddar.datavis.ca/.
We can combine grob from different sources
b <- barchart(tbl, auto.key=TRUE,
horizontal=FALSE)
x11()
barplot(t(tbl), legend.text=TRUE,
args.legend=list(x="top", bty="n",
cex=0.8, y.intersp=3))
gridGraphics::grid.echo()
library(grid)
g <- grid.grab()
dev.off()
## png
## 2
grid.newpage()
gridExtra::grid.arrange(g, b, ncol=2)
grid pushes viewports onto a stack, then pops them, see R Graphics and vignettes
grid.rect(gp = gpar(lty = "dashed"))
vp <- viewport(width = 0.5, height = 0.5)
pushViewport(vp)
grid.rect(gp = gpar(col = "grey"))
grid.text("quarter of the page", y = 0.85)
pushViewport(vp)
grid.rect()
grid.text("quarter of the\nprevious viewport")
popViewport(2)
Just reading the print method for ggplot objects shows how close grid is under ggplot2.
ggplot2:::print.ggplot
## function (x, newpage = is.null(vp), vp = NULL, ...)
## {
## set_last_plot(x)
## if (newpage)
## grid.newpage()
## grDevices::recordGraphics(requireNamespace("ggplot2", quietly = TRUE),
## list(), getNamespace("ggplot2"))
## data <- ggplot_build(x)
## gtable <- ggplot_gtable(data)
## if (is.null(vp)) {
## grid.draw(gtable)
## }
## else {
## if (is.character(vp))
## seekViewport(vp)
## else pushViewport(vp)
## grid.draw(gtable)
## upViewport()
## }
## if (isTRUE(getOption("BrailleR.VI")) && rlang::is_installed("BrailleR")) {
## print(asNamespace("BrailleR")$VI(x))
## }
## invisible(x)
## }
## <bytecode: 0x5f933d8>
## <environment: namespace:ggplot2>
There are various lower and higher-level ways of combining graphical output: some are described in https://cran.r-project.org/web/packages/egg/vignettes/Ecosystem.html in the egg package
df <- cbind(expand.grid(pkgs, types), n=c(tbl))
library(ggplot2)
gg <- ggplot(df, aes(x=Var1, y=n, fill=Var2)) + geom_col() +
xlab("") + guides(fill=guide_legend(title="")) +
theme(legend.position="top")
gg2 <- ggplotGrob(gg)
t <- gridExtra::tableGrob(as.data.frame(tbl))
gridExtra::grid.arrange(g, b, gg2, t, ncol=2, nrow=2)
### Coordinate reference systems
https://geocompr.robinlovelace.net/spatial-class.html#crs-intro https://geocompr.robinlovelace.net/reproj-geo-data.html https://clauswilke.com/dataviz/geospatial-data.html#projections https://r-spatial.org/book/02-Spaces.html https://riatelab.github.io/bertin/ https://observablehq.com/@neocartocnrs/hello-bertin-js https://observablehq.com/collection/@neocartocnrs/bertin https://observablehq.com/@neocartocnrs/bertin-js-projections?collection=@neocartocnrs/bertin https://observablehq.com/collection/@d3/d3-geo-projection
library(sf)
## Linking to GEOS 3.11.0, GDAL 3.6.0, PROJ 9.1.0; sf_use_s2() is TRUE
library(sf)
chicago_tracts <- st_read("chicago_tracts.gpkg")
## Reading layer `chicago_tracts' from data source
## `/home/rsb/und/ecs530/ECS530_h22/chicago_tracts.gpkg' using driver `GPKG'
## Simple feature collection with 2195 features and 28 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -88.94229 ymin: 40.73651 xmax: -86.92936 ymax: 42.66976
## Geodetic CRS: NAD83
We’ll use the tmap package with class interval styles from the classInt package:
library(tmap)
tm_shape(chicago_tracts) + tm_fill("med_inc_cv", style="fisher", n=7, title="Coefficient of Variation")
In ESRI documentation, CV thresholds of 12 and 40 percent are proposed for the transformed reported MOE values: https://doc.arcgis.com/en/esri-demographics/data/acs.htm. We’ll create a classified variable (ordered factor):
chicago_tracts$mi_cv_esri <- cut(chicago_tracts$med_inc_cv, c(0, 0.12, 0.40, Inf), labels=c("High", "Medium", "Low"), right=TRUE, include.lowest=TRUE, ordered_result=TRUE)
table(chicago_tracts$mi_cv_esri)
##
## High Medium Low
## 1400 766 29
and map it:
tm_shape(chicago_tracts) + tm_fill("mi_cv_esri", title="Reliability")
As the Low reliability tracts are small in size, the "view" mode for interactive mapping may help:
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(chicago_tracts) + tm_fill("mi_cv_esri", title="Reliability")
tmap_mode("plot")
## tmap mode set to plotting
Or equivalently using mapview, plotting the CV values rather than the ordered factor, which is not yet well-supported:
library(mapview)
mapviewOptions(fgb = FALSE)
mapview(chicago_tracts[,"med_inc_cv"], layer.name="Coefficient of Variation")
The RColorBrewer package gives by permission access to the ColorBrewer palettes accesible from the ColorBrewer website. Note that ColorBrewer limits the number of classes tightly, only 3–9 sequential classes
We can also display all the ColorBrewer palettes:
library(RColorBrewer)
display.brewer.all()
Try exploring alternative class interval definitions and palettes, maybe also visiting http://hclwizard.org/ and its hclwizard() Shiny app, returning a palette generating function on clicking the “Return to R” button:
library(colorspace)
hcl_palettes("sequential (single-hue)", n = 7, plot = TRUE)
pal <- hclwizard()
pal(6)
The end of rainbow discussion is informative:
wheel <- function(col, radius = 1, ...)
pie(rep(1, length(col)), col = col, radius = radius, ...)
opar <- par(mfrow=c(1,2))
wheel(rainbow_hcl(12))
wheel(rainbow(12))
par(opar)
https://clauswilke.com/dataviz/geospatial-data.html https://r-spatial.org/book/08-Plotting.html https://geocompr.robinlovelace.net/adv-map.html https://walker-data.com/census-r/mapping-census-data-with-r.html https://rspatial.org/terra/index.html https://rspatial.org/terra/rosu/index.html https://riatelab.github.io/mapsf/
IALE talk, landscapemetrics https://cran.r-project.org/package=landscapemetrics https://www.youtube.com/watch?v=jtAxJ-S89qI
OGH talk supercells https://cran.r-project.org/package=supercells https://cran.r-project.org/package=regional https://av.tib.eu/media/54880
RGS-IBG talk motif https://cran.r-project.org/package=motif https://www.youtube.com/watch?v=CDgsgcvsg_Y